class: center, middle, inverse, title-slide .title[ # The Normal Distribution ] .subtitle[ ## EDP 613 ] .author[ ### Week 5 ] --- <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script> <script type="text/x-mathjax-config"> MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () { MathJax.Hub.Insert(MathJax.InputJax.TeX.Definitions.macros,{ cancel: ["Extension","cancel"], bcancel: ["Extension","cancel"], xcancel: ["Extension","cancel"], cancelto: ["Extension","cancel"] }); }); </script>
# Idea The area under a normal curve is equal to 1 - represents a population - is probabilistic <img src="Slides-Week-5_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # By extension - The area under the curve between two values is a proportion of the total population <img src="Slides-Week-5_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## Example Assuming the area under the curve between 0 to 2 is 0.20, what is a. the proportion of the population between 0 and 2? b. the proportion of the population not between 0 and 2? --- ## Solution a. <img src="Slides-Week-5_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ## Solution b. <img src="Slides-Week-5_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- # Variables We have <br> .pull-left[ <p id="center" style="color:#dd99d2; font-weight: bold; border:1px; border-style:solid; border-color:#f0b5d3; border-radius: 25px; padding: 0.3em;"> `\mu`<br><br> population mean </p> ] -- .pull-right[ <p id="center" style="color:#f5ebd9; font-weight: bold; border:1px; border-style:solid; border-color:#f5ebd9; border-radius: 25px; padding: 0.3em;"> `\overline{Y}`<br><br> sample mean </p> ] -- <br> <br> <br> .pull-left[ <p id="center" style="color:#99d2dd; font-weight: bold; border:1px; border-style:solid; border-color:#99d2dd; border-radius: 25px; padding: 0.3em;"> `\sigma`<br><br> population standard deviation </p> ] -- .pull-right[ <p id="center" style="color:#e5d9e3; font-weight: bold; border:1px; border-style:solid; border-color:#e1e1f9; border-radius: 25px; padding: 0.3em;"> `s`<br><br> sample standard deviation </p> ] --- # The Emperical Rule: Idea <img src="Slides-Week-5_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- # The Emperical Rule: Statistic <img src="Slides-Week-5_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- # The Emperical Rule: Formula <img src="Slides-Week-5_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ## Example Assume a sample with `$$\mu = 176$$` `$$\sigma = 36$$` is normal. Approximately what percentage of the sample values are between 104 and 248? --- ## Solution >- The value 104 is two standard deviations below the mean since `\begin{aligned} \mu - 2\sigma &= 176 - 2 \cdot 36\\ &= 104 \end{aligned}` -- >- The value 248 is two standard deviations above the mean since `\begin{aligned} \mu+2\sigma &= 176 + 2 \cdot 36\\ &= 248 \end{aligned}` -- >- So about 95% of the data points are between 104 and 248. --- ## Example Assume a sample with `$$\mu = 176$$` `$$\sigma = 36$$` is normal. Between what two value will about 68% of the sampled data points be? --- ## Solution >- The value 104 is two standard deviations below the mean since `\begin{aligned} \mu - 2\sigma &= 176 - \cdot 36\\ &= 140 \end{aligned}` -- >- The value 248 is two standard deviations above the mean since `\begin{aligned} \mu + 2\sigma &= 176 + \cdot 36\\ &= 212 \end{aligned}` -- >- So between 140 and 212 are about 68% of the data. --- # The `\(z\)`-score A `\(z\)`*-score* is a standard way to look at the normal curve. -- - By itself the values don't really mean anything -- - Provides a common metric for most measures -- When plotting a `\(z\)`-score -- >- The points on the horizontal axis to the >>- `\(\leftarrow\)` of the `\(\mu\)` have negative `\(z\)`-scores. -- >>- `\(\rightarrow\)` of the `\(\mu\)` have positive `\(z\)`-scores. -- >- The mean `\(\mu=\)` the median = the mode sits at the origin (middle) -- >- Needs something to interpret it like the *Standard Normal Table* (Appendix B; p. 375) --- ## Example How much of a population is represented by the shaded area under the standard normal curve? <img src="Slides-Week-5_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- ## Solution *idea*: The area less than `\(1.25\)` is equivalent to the entire area to the left of `\(\mu\)` added to the area between `\(\mu\)` and `\(1.25\)` *standard normal table*: This is `\(0.5000 + 0.3944 = 0.8944\)` implying that our sample consists of approximately 89.44% of the population <br> <br> <center> <img src="snt1.png" width="255" height="326" alt="Standard Normal Table Example"> </center> --- ## Example How much of a population is represented by the shaded area under the standard normal curve? <img src="Slides-Week-5_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ## Solution *idea*: The area less than `\(-0.28\)` is equivalent to the area greater than `\(0.28\)` added to the area greater than `\(0.94\)` *standard normal table*: This is `\(0.3897 + 0.1736 = 0.5633\)` implying that our sample consists of approximately 56.33% of the population <br> <br> .pull-left[ <center> <img src="snt2a.png" width="196" height="326" alt="Standard Normal Table Example"> </center> ] .pull-right[ <center> <img src="snt2b.png" width="193" height="326" alt="Standard Normal Table Example"> </center> ] --- ## Example How much of a population is represented by the shaded area under the standard normal curve? <img src="Slides-Week-5_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## Solution *idea*: The area between `\(-0.75\)` and `\(1.7\)` can be look at as the area between `\(0\)` and `\(0.75\)` added to the the area between `\(0\)` and `\(1.7\)` *standard normal table*: This is `\(0.2734 + 0.4554 = 0.7288\)` implying that our sample consists of approximately 72.88% of the population <br> <br> .pull-left[ <center> <img src="snt3a.png" width="181" height="326" alt="Standard Normal Table Example"> </center> ] .pull-right[ <center> <img src="snt3b.png" width="190" height="326" alt="Standard Normal Table Example"> </center> ] --- # Calculating the `\(z\)`-score Let `\(Y\)` be a value from a normal distribution with a mean and standard deviation, then the `\(z\)`-score of `\(Y\)` is -- <br> .pull-left[ <p id="center" style="color:#e2b7bf; font-weight: bold; border:1px; border-style:solid; border-color:#e2b7bf; border-radius: 25px; padding: 0.3em;"> `z=\frac{Y-\overline{Y}}{s}`<br><br> sample </p> ] -- .pull-right[ <p id="center" style="color:#c4b7e2; font-weight: bold; border:1px; border-style:solid; border-color:#c4b7e2; border-radius: 25px; padding: 0.3em;"> `z=\frac{Y-\mu}{\sigma}`<br><br> population </p> ] --- ## Example A sample has mean `\(\mu = 47\)` years old and standard deviation `\(s = 3\)`. What proportion of the population is included between 50 and 55? --- ## Solution We can find the `\(z\)`-scores by .pull-left[ `\begin{align} z_{50} &= \frac{50-47}{3}\\ &= 1 \end{align}` ] .pull-right[ `\begin{align} z_{55} &= \frac{55-47}{3}\\ &\approx 2.67 \end{align}` ] so we are looking at the area under the normal curve between 1 and 2.67 <img src="Slides-Week-5_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Solution (using the area between the `\(\mu\)` and `\(z\)`) *idea*: The area between 1 and 2.67 can be found by finding the area between the mean and 1 subtracted from the area between the mean and 2.67 *standard normal table*: This is `\(0.1587 - 0.0038 = 0.1549\)` implying that our sample consists of approximately 15.49% of the population .pull-left[ <center> <img src="snt4a.png" width="180" height="326" alt="Standard Normal Table Example"> </center> ] .pull-right[ <center> <img src="snt4b.png" width="183" height="326" alt="Standard Normal Table Example"> </center> ] --- ## Solution (using the area beyond `\(z\)`) *idea*: The area between 1 and 2.67 can be found by finding the area beyond 2.67 and subtracting it from the area beyond 1 *standard normal table*: This is `\(0.4962 - 0.3413 = 0.1549\)` implying that our sample consists of approximately 15.49% of the population <br> <br> .pull-left[ <center> <img src="snt4c.png" width="183" height="326" alt="Standard Normal Table Example"> </center> ] .pull-right[ <center> <img src="snt4d.png" width="180" height="326" alt="Standard Normal Table Example"> </center> ] --- # Note If we have a `\(z\)`-score, `\(\mu\)`, and `\(\sigma\)`, we can restructure our equation to figure out a data point value using basic algebra `\begin{align} z &= \frac{Y-\mu}{\sigma}\\ z\cdot\sigma &= \frac{Y - \mu}{\sigma} \cdot \sigma\\ z\cdot\sigma &= \frac{Y - \mu}{\cancel{\sigma}} \cdot \cancel{\sigma}\\ z\cdot\sigma &= Y - \mu\\ z\cdot\sigma + \mu &= Y - \mu + \mu\\ z\cdot\sigma + \mu &= Y \cancel{- \mu + \mu} \\ z\cdot\sigma + \mu &= Y \\ \mu + z\cdot\sigma &= Y \\ Y &= \mu + z\cdot\sigma \end{align}` --- # Finding the value of a data point So to figure out the true value of a data point, we use `$$Y = \mu + z\cdot\sigma$$` --- ## Example The Centers for Disease Control and Prevention reported that diastolic blood pressures of adult women in the United States are approximately normally distributed with mean 80.5 and standard deviation 9.9. Find the 67th percentile of the blood pressures How much of a population is represented by the shaded area under the standard normal curve? <img src="Slides-Week-5_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- ## Solution *idea*: Since we are trying to find the 67th percentile, the standard normal curve can be split into two areas, namely everything - less than 0.67 - greater than 0.67, or the remaining 33% The area less than `\(0.67\)` is equivalent to the entire area to the left of `\(\mu\)` added to the area between `\(\mu\)` and `\(0.67\)` which is `\(0.67 - 0.50 = 0.17\)` --- ## Solution (continued) *standard normal table*: This is `\begin{align} Y &= 80.5 + 0.44\cdot 9.9\\ &\approx 84.86 \end{align}` implying that data point is likely 84.86. This means that the 67th percentile of diastolic blood pressures of adult women in the United States is approximately 84.86 <center> <img src="snt5.png" width="180" height="326" alt="Standard Normal Table Example"> </center> --- ## That's it. Let's take a break before working in R.